Search results for "Audio signal processing"
showing 10 items of 18 documents
Real-time signal processing in embedded systems
2016
International audience
On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization
2017
Abstract The growing interest to incorporate new features into mobile devices has increased the number of signal processing applications running over processors designed for mobile computing. A challenging signal processing field is acoustic source localization, which is attractive for applications such as automatic camera steering systems, human-machine interfaces, video gaming or audio surveillance. In this context, the emergence of systems-on-chip (SoC) that contain a small graphics accelerator (or GPU), contributes a notable increment of the computational capacity while partially retaining the appealing low-power consumption of embedded systems. This is the case, for example, of the Sam…
2015
Visuo-auditory sensory substitution systems are augmented reality devices that translate a video stream into an audio stream in order to help the blind in daily tasks requiring visuo-spatial information. In this work, we present both a new mobile device and a transcoding method specifically designed to sonify moving objects. Frame differencing is used to extract spatial features from the video stream and two-dimensional spatial information is converted into audio cues using pitch, interaural time difference and interaural level difference. Using numerical methods, we attempt to reconstruct visuo-spatial information based on audio signals generated from various video stimuli. We show that de…
On the Design of Probe Signals in Wireless Acoustic Sensor Networks Self-Positioning Algorithms
2018
A wireless acoustic sensor network comprises a distributed group of devices equipped with audio transducers. Typically, these devices can interoperate with each other using wireless links and perform collaborative audio signal processing. Ranging and self-positioning of the network nodes are examples of tasks that can be carried out collaboratively using acoustic signals. However, the environmental conditions can distort the emitted signals and complicate the ranging process. In this context, the selection of proper acoustic signals can facilitate the attainment of this goal and improve the localization accuracy. This letter deals with the design and evaluation of acoustic probe signals all…
Video preprocessing for audiovisual indexing
2003
We address the problem of detecting shots of subjects that are interviewed in news sequences. This is useful since usually these kinds of scenes contain important and reusable information that can be used for other news programs. In a previous paper, we presented a technique based on a priori knowledge of the editing techniques used in news sequences which allowed a fast search of news stories (see Albiol, A. et al., 3rd Int. Conf. on Audio and Video-based Biometric Person Authentication, p.366-71, 2001). We now present a new shot descriptor technique which improves the previous search results by using a simple, yet efficient, algorithm, based on the information contained in consecutive fra…
Decoding Children's Social Behavior
2013
We introduce a new problem domain for activity recognition: the analysis of children's social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe met…
The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.
2014
Abstract We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant–vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG–MEG recording of mismatch negativity (MMN/MMNm). Overall, speech–sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, spe…
The indexing of persons in news sequences using audio-visual data
2004
We describe a video indexing system that automatically searches for a specific person in a news sequence. The proposed approach combines audio and video confidence values extracted from speaker and face recognition analysis. The system also incorporates a shot selection module that seeks for anchors, where the person on the scene is likely speaking. The system has been extensively tested on several news sequences with very good recognition rates.
Capturing and Indexing Rehearsals: The Design and Usage of a Digital Archive of Performing Arts
2015
International audience; Preserving the cultural heritage of the performing arts raises difficult and sensitive issues, as each performance is unique by nature and the juxtaposition between the performers and the audience cannot be easily recorded. In this paper, we report on an experimental research project to preserve another aspect of the performing arts—the history of their rehearsals. We have specifically designed non-intrusive video recording and on-site documentation techniques to make this process transparent to the creative crew, and have developed a complete workflow to publish the recorded video data and their corresponding meta-data online as Open Data using state-of-the-art audi…
Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network
2020
In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use atten…